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In their recent comprehensive review of research on college Impact, 
Feldman and Newcomb (1969) summarize more than 1,000 empirical studies 
In which Investigators have attempted to learn how students are affected 
by their college experience. For the most part, the findings from these 
studies are very difficult to Interpret, primarily because of problems 
In research design and methodology. In view of the burgeoning state of 
current research on college Impact, It may be useful to review these 
methodological difficulties and to suggest certain ways In which future 
research can be designed to avoid some of those which have plagued most 
of the studies reviewed by Feldman and Newcomb. 

Among the problems that will be covered are the following: single- 

institution versus multl-lnstltutlon studies, longitudinal versus cross- 
sectional data, alternative statistical designs, the effects of measure- 
ment error, alternative methods of measuring environmental variables, 
methods for detecting student-environment interaction effects, and 
methods of collecting data. Throughout the paper, however, the discussion 
will focus on problems of inferring causation : that is, of determining 

if and how the student is affected by his college experience. 

A Conceptual Model 

For purposes of discussion, we shall utilize a model of student 

development in higher education that has characterized much recent multi- 
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institutional research. in this model, the college can be seen as com- 
prising three conceptually distinct components: student outputs , student 
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inputs , and the college environment . 

Student outputs refer to those aspects of the student's development 

that the college either does influence or attempts to influence. Al- 



though these outputs can be expressed at very high levels of abstraction 
(for example, "the utlimate welfare and happiness of the individual"), 
research is usually concerned with those relatively immediate outputs that 
can be operationalized. Specifically, then, the term outputs refers to 
measures of the student's achievements, knowledge, skills, values, atti- 
tudes, aspirations, interests, and daily activities. Adequate measures 
of relevant student outputs are, clearly, the sine pua non of meaningful 
research on college impact. 

Remarkably, only a handful of the studies reviewed by Feldman and 
Newcomb were concerned with the impact of colleges on cognitive outcomes. 
There are, to be sure, many hundreds of studies of academic achievement 
(Feldman and Newcomb did not review these), but such studies are usually 
concerned with predicting college grade point averages rather than with 
measuring growth or change in cognitive skills or with assessing college 
impact on such skills. Considering that the development of the student's 
cognitive skills is probably the most common educational objective of 
both students and colleges, this lack of research is unfortunate. It can 
probably be explained by the logistical problems involved in measuring 
cognitive outcomes (the necessity for proctoring and the high costs of 
achievement testing* for example) , problems much more formidable than 
those encountered in measuring attitudinal outcomes (which can be assessed 
with relatively inexpensive, self-administered questionnaires). 

Student inputs are the talents, skills, aspirations, and other 
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ootentials for growth and learning that the new student brings with him 
to college. These inputs are, in a sense, the raw materials with which 
the institution has to deal. Many inputs can be viewed simply as "pre- 
tests" on certain outputs (career choice and personal values, for example), 
whereas others (sex and race, for example) are static personal attributes. 
Inputs can affect outputs either directly or by interaction with environ- 
mental variables. 

The college environment refers to those aspects of the higher 
educational institution that are capable of affecting the student. 

Broadly speaking, they include administrative policies and practices, 
curriculum, physical plant and facilities, teaching practices, peer 
associations, and other characteristics of the college environment. 

The relationships among these three components of the model are 
shown schematically in Figure 1. The principal concern of research on 
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Figure 1 

college impact is to assess relationship "B," the effects of the college 
environment on relevant student outputs. Relationship ”C" refers to the 
fact that outputs are also affected by inputs, and relationship "A" to 
the fact that college environments are affected by the kinds of students 
who enroll. 

In addition to the "main" effects of college environments on 
student outputs (B) , the investigator may also be interested in certain 
interaction effects involving student inputs and college environments. 



The diagram suggests that there are two types of interaction effects: 
those in which the effect of input on output is different in different 
college environments (AC), and those in which the effect of the college 
environment is different for different types of students (AB) . Research 
on college impact is ordinarily concerned more with the second type. 

Problems of Design 

Although the ideal study of college impact would incorporate infor- 
mation on all three components of the model -- student inputs, environ- 
ments, and student outputs -- most of the studies covered by Feldman and 
Newcomb lacked data on at least one of these components. In this sec- 
tion, we shall review some of the inferential problems that characterize 
such studies. 

Following the conventions of statistical inference, we can assume 
that studies of college impact should be designed to minimize two kinds 
of inferential error: 

Type I errors (rejection of the null hypothesis when it is true) 
occur when there is no college effect, but the investigator concludes 
that there is. 

Type II errors (acceptance of the null hypothesis when it is false) 
occur when there is a significant college effect, but the investigator 
concludes that there is not. 

The special problems inherent in the design of college effects 
studies indicate that there is still a third type of inferential error 
which we shall call Type III errors . These occur when there is a signi- 
ficant college effect, but the investigator concludes that the opposite 

3 

effect occurs. In a sense, a Type III error combines both Type I and 
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Type II errors, since it involves simultaneously the rejection of a 
null hypothesis which is true and (implicitly) the acceptance of a null 
hypothesis which is false. (A convenient mnemonic device for defining 
Type III errors is that 1+2=3.) 

Some of the controversy over the design of college impact studies 
stems not so much from basic disagreements over design strategy as from 
differences in the relative values assigned to Type I and Type II errors. 
Investigators who are primarily concerned about minimizing Type I errors, 
for example, fear that the highly nonrandom distribution of students among 
institutions will lead educators and students to conclude that certain 
college "effects" exist when, in fact, they do not. Thus, they regard 
adequate control of differential student inputs as an essential feature 
in their design. Researchers who are more concerned about Type II errors, 
on the other hand, fear that too much control over student inputs will 
reduce the chances of finding environmental effects. These two somewhat 
opposed emphases are in part historical. That is, the earliest investi- 
gators of college impact exerted virtually no control over student inputs; 
as a consequence, the very substantial institutional differences in stu- 
dent outputs which were found they attributed to the environmental influ- 
ences of the colleges. When another group of investigators subsequently 
re-examined this early work, they discovered that differences in institu- 
tional outputs could be largely attributed to differences in inputs and 
that the relative "impacts" of colleges diminished markedly once these 
differential student inputs were taken into account. Most recently, 
however, some investigators have been disturbed by the possibility that 
a design which controls for student inputs may tend to underestimate the 
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differential impact of colleges or, in some cases, to obscure particular 
environmental effects* 

We shall discuss these and other possible effects of various 
designs on both types of error in the section on Multi-institution 
Longitudinal Studies (below). It should be pointed out here, however, 
that very large differences in student inputs at various institutions 
a prominent characteristic of higher education in the United States 
(Astin, 1965b) -- are almost sure to make for large differences in stu- 
dent outputs, regardless of the actual effects of institutions. As a 
result, failure to take into account these differences in input when 
studying college effects virtually guarantees that the investigator will 
commit some Type I errors. More important, ignoring differential student 
inputs maximizes the investigator ' s chances of committing Type III errors. 
Studies of "Growth" or "Change 11 at Individual Colleges 

Perhaps the prototypical study reviewed by Feldman and Newcomb 
involves the testing and retesting of students at a single institution. 
Characteristically, the students complete an attitudinal questionnaire 
or inventory when they first enter college and take it again one year 
later, four years later, or in a few cases, many years after graduation. 
Measures of "change" or "growth" are obtained by comparing the student's 
input scores from the initial administration with his output scores from 
the followup administration. (These comparative measures are usually 
simple difference scores, although residual gain scores are used 
occasionally.) In subsequently interpreting these scores, the investi- 
gator typically assumes that any observed changes are due to the students' 
experiences in college. In other words, he equates "change" with "impact." 
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This type of design has the advantage of focusing attention on the 
longitudinal nature of student change and development in that it views 
the student's output performance in relation to his input characteristics. 

Its glaring weakness, however, is that it really produces no information 
that bears directly on the question of environmental impact . Would the 
same changes have occurred if the student had attended a different kind 
of college or had not gone to college at all? In the context of our con- 
ceptual model, this type of study yields information on student inputs 
and outputs but not on the environment. Thus, the college environment is 
not a variable but a constant. (The situation here is identical to the 
one encountered in experimentation when no control group is used.) 

The very practical danger in assuming that change equals impact can 
be illustrated with an anecdote. I recently overhead a colleague from 
a highly selective small college complaining that nearly a third of his 
undergraduates who start out majoring in science shift to a nonscience 
field before graduation. He interpreted this decline in science interest 
(change) as somehow resulting from the science curriculum of the college 
(impact) . As a consequence, he and other members of a committee on curri- 
culum reform were seriously considering major changes in the science 
curriculum of the college in the hope of reducing the number of students 
who withdraw from science fields. As it happened, this colleague's 
institution was one of several hundred colleges participating in a longi- 
tudinal study of institutional impact on career choice (Astin & Panos, 1969). 
What he did not know was that the longitudinal analyses had revealed that 
the dropout rate from science was actually lower at his college than at 
almost any other college in the sample. Thus, his college was exerting 
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a relatively positive rather than a negative influence on the student's 
interest in science. Under these circumstances, major changes in the 
existing science curriculum could very well increase rather than decrease 
the student dropout rate from science at the college. 

Many investigators use a variation of this basic design: instead 

of collecting longitudinal information, they simply compare groups of 
freshmen and upperclassmen simultaneously on some measure. This method is 
so full of pitfalls, (many of which are discussed at length by Feldman and 
Newcomb) that one wonders if there is the slightest justification for 
supposing that the observed "changes" are in any way related to the college 
experience. In addition to the problems already mentioned, this method 
carries with it potentially serious deficiencies in sampling. It rests 
on the assumptions that (a) upperclassmen are a representative sample -- 
at least insofar as the output variable is concerned -- of the total 
cohort of freshmen from which they were drawn, and (b) this original cohort 
was drawn from the same population as the current freshmen who are being 
compared with the upperclassmen. 

The tenuousness of these assumptions is obvious when one realizes 
that any sample of upperclassmen necessarily excludes dropouts and in- 
cludes transfers — two groups that are very likely to differ from the 
students who entered as freshmen and continued on without a break in their 
undergraduate progress. Moreover, changes in the nature of successive 
entering freshman classes may occur as a result either of modifications 
in either the applicant pool or admissions practices or of changes in the 
college student population itself. That such population shifts are indeed 
possible -- even over a brief period of time — is revealed by the ACE's 
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annual surveys of entering freshmen. For example, during the four 
most recent years — 1966, 1967, 1963, and 1969 -- the percentages of 
entering freshmen who checked "none" as their present religious prefer- 
ence has gone up consistently: 6.9, 7.9, 9.6, and 13.2 These trends 

held true for both men and women and for students at most types of insti- 
tutions. Thus, even if no students had dropped out or changed their 
religious preferences since entering college, a comparison of the current 
(1969-1970) freshmen with the current senior classes at many colleges 
would lead to the conclusion that nearly half of the students who initially 
reported that they had no religious preference "changed" to some other 
choice after entering college. 

The dangers in assuming that "change" is equivalent to "college 
impact" suggest that changes in students during college should be viewed 
as comprising two components: change resulting from the impact of the 

college and change resulting from other influences (maturation, non- 
college environmental effects, etc.). Note that the college may (a) 
bring about changes which otherwise would not occur, (b) exaggerate or 
accelerate changes resulting from other sources, or (c) impede or counter- 
act changes resulting from other sources (as in the example, cited above, 
where the college's dropout rate from science was much lower than average). 
Studies of Environments and Student Outputs 

One alternative to the single-institution studies that are so 
common in research on college impact is the multi-institution study, in 
which the student outputs of several institutions are compared. It was 
a frequent practice during the 1950's, for example, to compare institu- 
tions on such output measures as the percentage of graduates obtaining Ph.D. 
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degrees or the number of alumni listed in Who* s Who . While such studies 
have the advantage of permitting the investigator to study variations in 
college environments, the empirical findings that result tend to be highly 
ambiguous because the student input has been disregarded. 

The importance of student input data in multi-institution studies 
is aptly illustrated by the history of research on "Fh.D. productivity.” 

The earliest of these studies indicated that the graduates of certain 
colleges and universities were much more likely than were the graduates 
of other institutions to win fellowships for graduate study and to go on to 
obtain the Fh.D. degree (Knapp & Goodrich, 1952; Knapp & Greenbaum, 1953). 
More important, the environments of the "highly productive” institutions, 
when compared with those of the less productive ones, were found to have 
higher faculty -student ratios, larger libraries, more funds for scholarships 
and research, and similar resources usually assumed to indicate institu- 
tional "excellence” and eminence. In short, the causal inferences drawn 
from these early studies were. that such institutional resources are con- 
ducive to the development of the student's motivation to seek advanced 
training. Among other things, this research evidence seemed to confirm 
the folklore about what makes for "quality” in higher education. Taken 
at face value, and assuming that the output measure under study (motivation 
to seek advanced training) was relevant to the goals of the institution, 
these findings offered empirical support to the administrator in his 
attempts to increase the size of his faculty, library, and so forth. 

But the validity of these earlier studies came to be doubted when 
it was shown that institutions differ widely in their student inputs: 

Highly productive institutions, for example, enroll greater proportions 
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of academically able students than do less productive institutions 
(Holland, 1957). Intellectually advantaged students are, of course, 
more likely than are average students both to win graduate fellowships 
and to be interested in pursuing the doctorate even if their institution 
exerts no special influence during the undergraduate years . These doubts 
were subsequently confirmed by a series of studies (Astin, 1962, 1963a, 
1963b) in which differential undergraduate student inputs to diverse 
institutions were controlled. Thus, when the abilities, career plans, and 
socioeconomic backgrounds of the entering students were taken into 
account, an institution’s output of Ph.D.'s was revealed to be largely a 
function of the characteristics of its entering students rather than of 
its resources. Moreover, certain types of institutions that were earlier 
described as "highly productive" of Ph.D.'s turned out to be underproduc- 
tive in relation to their student inputs. In addition, the apparent 
"effects" of library size, faculty- student ratio, and other similar 
indicators of institutional quality disappeared. 

Multi-institution Longitudinal Studies 
The inferential problems inherent in single-institution studies and 
in those multi-institution studies that do not utilize student input data 
indicate that an adequately designed study of college impact requires 
information concerning all three components of our model: student inputs, 

college environments, and student outputs. Merely collecting such data, 
however, does not assure that true college effects will be identified and 
spurious college effects will not. The avoidance of such inferential 
errors depends on a number of factors, including the nature of the student 
input data obtained and the statistical method used to analyze the data. 
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Since there is no way to guarantee that the nonrandom distribution of stu- 
dents among institutions will be compensated for completely, the investiga - 
tor's task in collecting data and in selecting a statistical method jU 
simply to reduce the chances that his inferences will be wrong . 

Student Input Data 



"Relevant" student input data are those which affect the student's 
choice of a college or the student output variable under study or both. 

To reduce the chances of committing Type I errors, however, it is not neces- 
sary to collect both types of data: As Figure 1 indicates, an unbiased es- 

timate of the environment-output effect (B) can be obtained if either A 
(input- environment) or C (input-output) is controlled. If both relationships 
are controlled, however, a more sensitive test of environment- output effects 
will result, thus reducing the probability of committing Type II errors. 

The designs that result when student input data are controlled in 
different ways can be depicted by a simple 2x2 table (Figure 2) . (For 
purposes of illustration, we have used the terminology of linear multiple 
regression to label the different designs, although the basic logic of 
the designs does not require that linear regression be the method used.) 



Input Partialled Out of Output? 



Input 
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Environment? 



Yes 



No 
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Correlation 
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Correlation I 


Part 

Correlation II 


Zero-Order 
Correlation 
Between En- 
vironment 
and Output 



Figure 2. Four Types of Multi-institution Designs for Studying the Rela- 
tionship Between College Environments and Student Outputs. 
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The lower right-hand box in Figure 2 repre enta the nulti-insti- 
tutional design in Which no student input data are used (see previous 
section for examples) . The upper left-hand box represents the partial 
correlational design in Which the effects of student inputs on both out 
put and environment are controlled. As we have already indicated, this 
design provides the most sensitive test of environmental effects. 

Part correlation IX (lower left box of Figure 2) involves control 



over the input-output relationship but not over the input-environment 



relationship. 5 Since the total output variance is likely to be more 
dependent on input than on environment, this design is probably the second 
most sensitive of the four. An interesting application of this design in 
multi-institution studies is first to solve the regression equations using 
all students, and then to aggregate the residual output scores of stu- 
dents within an institution, thereby producing a mean residual output 
score for the institution. This mean residual can provide a useful quan- 
titative measure of institutional '-impact." For example, in a recent 
multi-institution longitudinal study (Astin & Panos, 1969), one of the 
output measures was whether or not the student had dropped out during the 
four years after entering college (scored as a dichotomy: 1 - stayed in 
college for four years; 0 - dropped out). The mean college residuals on 
this measure. Which varied the 246 college, from -30% to +16%, thus 



provided a measure of the extent to 



which each college's retention rate 



was either above or below what would have been expected 



from the charac 



teristics of its entering students. 

The fourth design shown in Figure 2 — part correlation I — involves 

control over the input-environment relationship but not over the input- 
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output relationship. Although this method has seldom been used, it 
yields an estimate of the efficiency of a particular environmental 
variable (or combination of environmental variables) • By "efficiency 
we mean the extent to which the total variation in student performance 
on the output measure can be attributed solely to theoperation of 
environmental variables. 

A final question relating to student input data concerns the data 
which are actually used. Investigators who are used to thinking in 
terms of experimental rather than correlational models run the risk of 
utilizing only a single student input measure, a "pretest" measure or 
"covariate." (Investigators who regard research on college impact as 
simply a matter of "change" will be similarly tempted to rely on a 
single input measure.) The problem with single input measures is that 
they are almost sure to be inadequate, since the distribution of student 
inputs among institutions is biased with respect not just to one but to 
many student attributes (Astin, 1965b; Holland, 1959). Because the 
factors influencing college choice are often difficult to identify, 
probably the best protection for the researcher here is to measure and 
control all student attributes that are likely to affect the output 
measures under study. 

That using only a single "pretest" measure in studying college 
impact can seriously bias the conclusions is illustrated by a recent 
study in which the three Area Tests on the Graduate Record Examination 
(GRE) were used as output measures (Astin, 1968b). In all three analyses, 
the student's initial ("pretest") aptitude entered as the first variable 
in the stepwise regression. In two of the three analyses, however, the 
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student ' s sex entered with a large weight at the second step. The 
sex ratio in the student body, it should be noted, is also strongly 
related to environmental attributes such as Cooperativeness, Cohesive- 
ness, and Feriininity (As tin, 1968a). Clearly, if initial aptitude had 
been the only student input variable considered, the findings might 
have shown -- incorrectly - - that the student's learning and achieve- 
ment are significantly affected by the degree of cooperativeness, co- 
hesiveness, or femininity of the college environment. 

Statistical Alternatives 

Three basically different statistical methods have been used to 
analyse input, output, and environmental data: matching, actuarial 

tables, and linear multiple regression analysis. 

Matching . Perhaps the least desirable statistical approach is to 
match students entering different colleges in terms of their input 
characteristics. Not only are many subjects lost, but also the sub- 
samples of students selected for study are unrepresentative of their 
institutions; thus potentially serious regression artifacts are intro- 
duced into the data. These and other problems with matching designs 
have been discussed at length elsewhere (e.g., Campbell & Stanley, 1963). 
Briefly, the major inferential problem is that the analysis is likely 
to yield artifactual "effects" which are in reality the result of 
errors of measurement in the input variables. (These and other problems 
associated with measurement errors will be discussed in the next section.) 

It is not generally recognised that research on interaction effects 
ordinarily employs a kind of matching design and is therefore liable to 
the same deficiencies. That is, if one sorts out his students in terms 
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of attributes such as ability, sex, race, and so forth, his findings 
on the effects of college variables are likely to be biased by errors 
of measurement in these student attributes. For example, if students 
are stratified by ability level (in order to detect possible inter- 
action effects between ability and some college characteristic), the 
students in any given ability category will include representative 
subsamples of students from some colleges and highly nonrepresentative 
subsamples from others. Since the error of measurement in any student’s 
ability test performance is likely to be correlated with the extent to 
which is score deviates from the mean score of his classmates (i.e., 
students with relatively high scores being more likely to have positive 
errors of measurement than students with relatively low scores), the 
••superior" students from the least selective colleges are more likely to 
have spuriously high ability test scores (i.e., positive measurement 
errors) than are the "superior" students from the most selective colleges. 
Thus, such studies may find that college selectivity, or some variable 
correlated with selectivity, "affects" highly able students, when in 
fact there is no effect. (For a fuller description of this phenomenon, 

see the next section.) 

Even interaction studies that use highly objective student input 
characteristics, such as race and sex, are not free from these artifacts, 
since the measurement of these attributes too is likely to contain some 
error: A few women will probably be misclassified as men, a few non- 

whites misclassif ied as whites, and so forth. The bias occurs because 
such errors are not equally probable in all types of institutions. Thus, 
there is likely to be more error of measurement in classifying students 
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as "white" (l.e. , more nonwhites) among those attending predominantly 
Negro colleges than among those attending predominantly white colleges. 
Similarly, the chances are probably greater that students classified 
as "female" are really male if they are attending a technological 
institution than if they are attending a teachers college. 

Actuarial Tables . The second basic type of design involves the use 
of actuarial tables for controlling differential student inputs. Actuarial 
tables are especially helpful when the input variables are qualitative 
rather than quantitative in nature (Astin, 1962, 1963a). Briefly, what 
the investigator does is to sort his total pool of subjects into discrete 
cells on the basis of their input attributes (by sex, by race, by family 
SES, and so forth). Cells need not, of course, be balanced (e.g. , one 
might form separate sex cells for one race but not for another). The 
purpose of the sorting procedure is to generate new cells in such a way 
that the be tween - cell variance in the output measure is maximized and 
within -cell variance is minimized. The actuarial approach is similar in 
some ways to multiple group discriminant analysis, except that the roles 
of the independent and dependent variables are reversed. In discriminant 
analysis, the groups define the dependent variable and the independent 
variables are used to form a metric which maximally discriminates the 
groups. Conversely, in actuarial analysis, the metric is a given (the 
dependent variable), and the independent variables are used to form 
groups which maximally discriminate this metric.** 

At the point where it is no longer possible to form further groups 
which significantly discriminate with respect to the output variable, 
the "predicted" or "expected" output score for each student becomes the 
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mean output score of all students occupying the cell where he is located. 
(One refinement, which might be desirable for use with relatively small 
samples, is to exclude a given subject in computing the mean for his cell.) 
The difference between the expected value and the student's actual score 
on the output measure thus becomes the dependent variable for analysis of 
college impact. Students can be sorted into their respective colleges, 
and the mean discrepancy score computed separately for each college. At 
this stage, one has a situation similar to that described above for part 
correlation II (Figure 2). 

The principal advantages of the actuarial table are that it is pro- 
bably much easier to understand than standard regression techniques (below) 
and that it permits the investigator to take into account interaction 
effects among student input variables as well as nonlinear effects of 
input variables. Its principal disadvantage is that there is no generally 
accepted analytic method for determining how the cells should be formed; 
the number of possible cells increases exponentially as the number of in- 
put variables increases. Moreover, to use the method effectively, one 
must have a very large sample of subjects. Nevertheless, it has been 
shown (Astin, 1962, 1963a) that actuarial tables can produce wide separa- 
tions on student output measures and that, given large enough samples, 
the cell means prove to be highly stable on cross-validation. While the 
actuarial approach does not easily accomodate variables that are contin- 
uous (rather than qualitative), it is possible, under certain conditions, 
to combine actuarial tables with regression analyses (see below). 

Linear Multiple Regression . The statistical method used most fre- 
quently in recent multi-institution longitudinal studies is linear multipl 
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regression* This technique can be applied in three basically different 
ways. The first, ard perhaps the most straightforward, approach is the 
"full model" described by Bottenberg and Ward (1963), in which the out- 
put measure is regressed on both input and environmental variables, with 

the student used as the unit of analysis. 

The second application is identical to the first, except that the 
institution rather than the student is used as the unit in the analyses 
both of environmental effects and of input effects. Mean scores on 
each input variable are calculated separately for each institution; 

Then the output variable is regressed on these mean input variables, 
after which the environmental variables are permitted to enter the analysis. 
Although the much smaller number of units involved (institutions versus 
students) makes this method computationally much simpler than the first, 
it should be used with caution because it greatly increases the probability 
of type II errors. The major problem here is that the method treats peer 
group effects as input, rather than environmental, effects. That is, 
many potentially important environmental variables are a reflection of 
(or at least highly correlated with) the aggregate or mean score on 
particular student input characteristics (the mean ability level of the 
student body, for example) . If the magnitude of a particular environmental 
effect is proportional to the institution's mean score on a particular 
input variable, this method may partial out the environmental effect along 

with any input variable effects. ^ 

The third application of regression analysis, which was alluded to 
in the previous discussion of actuarial tables, combines the first two 

variable is first regressed on input variables using 



uses. The output 
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the student as the unit of analysis. Mean residuals are then computed 
separately by college, and the effects of environmental variables 
assessed using the institution as the unit of analysis. This applica- 
tion is useful when the magnitude of the effects of particular colleges 
is being investigated. Moreover, in cases where the output measurement 
has a meaningful zero point (the percent of students who are affected 
in some way, for example) , the third method offers some interesting 
possibilities for further analyses of the mean residuals. For example, 
one migjht conduct a series of analyses using several different outcome 
criteria, each of which has been scored dichotomously so that the mean 
residual indicates the percentage of students at a given college who 
were differentially affected by that college. An empirical typology of 
colleges could then be developed by factoring the covariances (rather 
than the correlations) among the mean residuals obtained on the several 
different output variables. The resulting typology would thus give 
greatest weight to those environmental influences that affect the 
largest percentage of students (i.e., to those output variables whose 
mean residuals show the greatest interinstitutional variance). 

A similar two-stage analysis can be carried out using the first 
two regression methods. Student input variables are first permitted to 
enter the regression equation, after which environmental measures are 
permitted to enter. Some investigators object that such a two-stage 
analysis biases the findings in favor of student input, as opposed to 
environmental, variables, but this supposed "rivalry" between the two 
types of variables is something of a straw man. Student input variables 
are controlled prior to the assessment of environmental effects for two 




— 1 






- 21 - 

reasons. First, there is the practical problem of reducing type I and 
type III errors. Unless some control over differential student inputs 
is exerted prior to the assessment of environmental effects, the investi- 
gator maximizes his chances of committing both of these types of inferen- 
tial errors. Second, there is the logical question of the temporal 
sequence of student input and environmental variables. While the college 
environment clearly can be influenced by the nature of the student input, 
it is illogical to assume that the student's input characteristics have 
been affected by the environment of his college. That is, the student's 
sex, race, SES, initial aptitude, and other input variables are set before 
he has any opportunity to be exposed to the college environment. It is 
true, of course, that the entering student's plans or attitudes may 
already have been influenced by his expectations about the college or by 
his having been accepted by the college for admission, but we are con- 
cerned here with the "environmental effects" that occur only after he 
matriculates. 

One way of regarding the problem of what percentage of the total 
output variance can be attributed to input or environmental variables is 
to conceive of the predictable variance in any output measure as comprising 
three conceptually and statistically separate components: 

. The percentage of output variance uniquely attribu- 
PART (I*E) c 

table to student input variables . This quantity refers to Part Correlation 
II (Figure 2) and is the squared multiple correlation between the output 
measure and the residual input variables (i.e., input independent of 



environmental variables) . 
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The percentage of output variance uniquely attributable 
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fo college environmental variables . This quantity refers to Part Cor- 
relation X (Figure 2) and is the squared multiple correlation between the 
output measure and the residual environmental variables (i.e., environ- 

mental Independent of input variables). 

_2 r 2 _ r 2 The percentage of confounded out^ 

R ‘ R PART (I*E) PART (E-I) K 

put variance . or output variance which is jointly attributable to inpu t 
and environment . 

The first two coefficients provide "lower-bounds" estimates of the 
total output variance that can be attributed, respectively, to input an 
environmental sources. An alternate method of computing "lower-bounds" 
estimates would be simply to determine how such R 2 increases when one set 
of variables is added to the other. This latter approach, however, may 
give too high an estimate because it would assign all "suppressor" effects' 
between the two sets of variables to the second set (i.e., the set being 

added to the equation). 

"Upper - bound a" estimates can be obtained simply by adding the lower 
bounds estimates to the confounded variance. An alternative method 
would be simply to leave out one set in computing R 2 . This alternate 
"upper-bounds" estimate, however, may be too low because it does not 
capitalise on any possible suppressor effects between environmental and 

input variables. 

During the past several years, some discussion has appeared in the 
literature concerning the most appropriate use of multivariate analysis 
in analysing student input, environmental, and student output information 
Some writers (Werts & Watley, 1969) prefer to pool all input and environ- 
mental variable, in a single analysis rather than to use the two-stage 
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input -environment analysis. The resulting regression coefficients, 
according to these writers, would reflect the "independent contribution" 
of various input and environmental variables in accounting for variation 
in the output variable. One interpretive difficulty with this method is 
that the various input and environmental variables are not independent. 
Under such conditions, some writers have concluded that "the notion of 
'independent contribution to variance' has no meaning when predictor 
variables are intercorrelated (Darlington, 1968, p. 169)." The problem 
here is essentially one of what happens to the confounded variance. 

Since this variance must be reflected in the regression coefficients, there 
is no way to determine srely from these coefficients just how much of 
the confounded versus unique variance has been allotted to any independent 
variable or class of variables. Another problem is that the regression 
coefficients do not show whether a particular variable is acting directly 
on the output variable or whether it is operating primarily as a suppressor 
variable by accounting for extraneous variance in other independent 
variables. 

A possible solution to these problems associated with regression 
weights would be to compute "lower -bounds" estimates of the unique influ- 
ence of a particular variable or class of variables by means of squared 

part correlations and then to compute "upper-bounds" estimates by adding 

9 

the confounded variance to the squared part correlations. The investi- 
gator could then evaluate these two estimates in terms of the various 
risks that he is willing to take of incurring type I and type II errors. 
Obviously, the greater the discrepancy between the upper- and lower- 
bounds estimates, the greater the risks. 
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A variation in the use of regression analysis proposed by some 
authors is causal path analysis (Duncan, 1966). Perhaps its major 
advantage over ordinary linear regression analysis is that it forces 
the investigator to specify the known or hypothesised relationships 
among his input, environmental, and output variables and aids him in 
differentiating "direct" from "indirect" influences on output variables. 
Path diagramming can also be a useful way of helping the investigator to 
see possible connections among his variables that he had not considered 
previously (Werts, 1968). Perhaps the major limitation of this method 
is that it can be unwieldly or even unworkable when the number of inde- 
pendent variables is large or when their temporal sequencing is not 
known. Since path analysis is more useful for testing specific causal 
hypotheses than for an open-ended exploration of college impact, its 
use should probably be confined to situations where the number of inde- 
pendent variables is relatively small, and where their interrelationships 
are relatively well understood. 

Causal analyses of input, output, and environmental data by means 
of multiple regression techniques as a general approach to studying 
college impact has been criticised by Richards (1966) (who is also 
cited at length by Feldman and Newcomb, Appendix F) on the grounds that 
residual values are "notoriously unreliable and subject fo errors of 
various sorts." But residual values are no less "reliable" than differ- 
ence scores or even the change scores which Richards himself recommends 
as alternatives. Richards also objects to the regression approach 
because it can "obscure true college effects." In this regard, it is 
important to note that the first and third uses of regression analysis 
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described earlier (i.e., the ones where inputs are controlled using 
the student as the unit of analysis) will not "obscure" even very 
small environmental effects, except in the special case where there 
is total confounding of input and environmental variables. (It would 
be difficult, for example, to compare the effects of men's colleges and 
women's colleges on some output variable that is related to sex.) 

Under these circumstances, there is no within-college variance in input, 
so that the environmental and input variables are completely confounded. 
However, as long as there is some overlap between institutions in 
student input characteristics, the application of regression analysis 
described above will not obscure any college effects, no matter how 
small. 

Detecting Interaction Effects 

In presenting our three-component model of college impact, we 
indicated that at least two kinds of student-college interaction effects 
can occur: those in which the effect of input on output is different 

in different college environments and those in which the effect of the 
environment is different for different types of students . 10 

In certain respects, the problem of interaction effects between 
student and college characteristics has more practical significance 
for administrative policy than the problem of the main effects of 
college environmental variables. A knowledge of environmental main 
effects is useful only when it is possible to modify existing colleges 
or to design new colleges in ways which will maximize the desired main 
effects. A knowledge of interaction effects, however, can be useful 
if there is no realistic possibility of making significant changes in 

existing college environments, since such knowledge permits one to 
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maximise desired educational objectives by redistributing students 
among existing Institutions in the most efficacious way. Such know- 
ledge is of obvious value to large city or state systems comprising 
several institutions; it can also be useful to individual private 
colleges in selecting students who are likely to benefit most from 
the particular program offered by the institution. 

Knowledge concerning interaction effects can also be applied 
within individual institutions. It can be used, for instance, as a 
basis for selecting those students most likely to profit from coun- 
seling and guidance in situations where resources for these services 
are limited, or for assigning students to various schools and colleges 
within an institution. Even if the final decision is left to the 
individual student, information about interaction effects can help 
him to make the most appropriate choice. 

Assessing interaction effects presents many methodological pro- 
blems, primarily because the number of possible student -environmental 
interaction effects is so large. Simply to "shotgun" the study of 
interaction effects by generating all possible combinations is usually 
unrealistic, either because of the large loss in degrees of freedom 
or because of limits on the number of variables that can be accommodated 

in a given analysis. 

Perhaps the most common approach is to generate only those inter- 
action terms suggested by a particular theory. However, the paucity 
of comprehensive theory in this field greatly limits the range of 
interaction terms that one can explore in this manner. 

Another approach is to select a limited number of student input 
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variables on the basis of their intrinsic importance (sex, race, SES, 
ability, for example) and to determine which are most likely to inter- 
act with environmental variables across a wide range of student out- 
comes. Future studies of college impact could then routinely examine 
interaction effects involving such variables. 

There are many possible methods for assessing interaction effects 
in the multivariate model. The simplest (and probably most expensive) 
is to perform separate analyses on subgroups of students (all men, 
for example) defined in terms of the student characteristics that might 
interact with environmental variables. Another approach is to perform 
only one analysis, but to "score" the interaction terms (student ability 
x college size, for example) as a separate variable. This method has 
the advantage of computational simplicity, provided that the number of 
interaction terms is not excessively large. If the investigator wishes 
to assess such interaction terms using a very large number of environ- 
mental variables, the former method of separate analyses by subgroups 
is probably preferable. 

Whatever method the investigator uses, he cannot be sure he has 
identified significant interaction effects until he has first controlled 
for the main effects of the variables that make up the interaction 
term. (The problem here is similar to the one encountered in analysis 
of variance designs, where the main effects of the independent variables 
must first be removed before the interaction effects can be studied.) 

Many investigators who study college impact fail to recognize the 
need for controlling main effects before examining interaction effects. 
Take, for example, studies of the "congruence" between the student and 
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his college, which are designed to test the assumption that the 
student* s success and his satisfaction with college will be related 
to the degree of similarity between certain of his own characteristics 
and some comparable measure of the college environment. In some cases, 
the student's personality is compared with the typical personality of 
his fellow students; in others, his expectations about the college 
environment are compared with some "objective" measure of that environ- 
ment. Whatever the measure used, such studies do not yield evidence 
on the importance of congruence unless one first examines how the stu- 
dent input and the environmental variables directly affect the output 
under consideration. In the single-institution study, of course, the 
main effects of the environmental variables cannot be tested since 
the environment is essentially a constant rather than a variable. Even 
in multi-institution studies, however, the main effects of the student 
input and the environmental characteristics in question have first to 

be controlled. 

The stepwise linear regression model provides a convenient way 
of examining such interaction effects. This method allows the investi- 
gator to score his interaction terms as separate variates, omitting 
them from the stepwise analysis until the significant main effects 
of the input and environmental variables are controlled. 

A similar multistage regression analysis can be used to assess 
other kinds of interaction effects. For example, if the investigator 
wishes to analyze possible interaction effects among input variables or 
among environmental variables as well as those between input and environ 
mental variables, his analysis would involve separate stages in which 
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the various effects would be controlled in the folowing sequence: 
main effects of input variables, interactions among input variables, 
main effects of environmental variables, interaction effects among 
environmental variables, and interactions between input and environ- 
mental variables. 

The assessment of interaction effects by repeated analyses of 
different subgroups rather than by the multistage analysis just 
described presents certain problems for the investigator in determin- 
ing when he has actually identified a significant "interaction" effect. 
Some investigators are tempted to conclude that they have done so in 
cases where a particular environmental variable is found to have a 
significant effect in one subgroup but not in another. It would be 
more definitive to test the significance of difference between environ- 
ment-output correlations rather than simply to ascertain that one cor- 
relation is significant but not the other. Even under these conditions, 
however, difference in the sizes of the samples or in the variances of 
either environmental or output variables can affect differences between 
environment -output correlations. The method of separate analyses by 
subgroups also presents the problems discussed in the previous section 
in connection with matching designs. 

Error of Measurement 

We have already indicated that one major difficulty with matching 
designs is the bias caused by error of measurement in the matching 
variables . What happens is that, in order to match subjects on an 
attribute like, say, academic ability, unrepresentative subsamples of 
students must be selected from each institution. If the institutions 
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under study differ markedly in selectivity, the investigator must 
select the least able students from the highly selective institutions 
and the most able students from the least selective. Since the 
latter are above average in ability in comparison with their class* 
mates, their scores are likely to contain more positive than negative 
errors of measurement. By contrast, the students from the highly 
selective colleges, being below average in ability relative to their 
classmates, are likely to have more negative than positive errors of 
measurement in their scores. Under these conditions, in each matched 
pair of students, the one from the highly selective college will more 
often than not have a higher "true" score than the one from the least 
selective college. Another way of demonstrating this effect is to 
give all members of each matched pair a second, independent test. 
Students from the highly selective colleges will tend to score slightly 
higher than they did on the first test, whereas students from the least 
selective colleges will tend to score slightly lower than previously. 
Multivariate analyses avoid such errors by making it possible to 
utilize all subjects from all institutions, so that within- institution 
errors of measurement in the input variables sum to zero. 

Measurement error can introduce the opposite kind of bias if the 
fallible input measure is used also as a basis for selecting students 
for admission. Under these circumstances, students in the highly 
selective institutions will tend to have more positive than negative 
errors of measurement in their test scores, whereas the rejects from 
these colleges, who enter less selective institutions, will tend to 
have more negative than positive errors in their scores. Unless the 
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investigator can estimate precisely the extent to which a particular 
measure is relied upon in the admissions process, he would do well to 
avoid using the measure altogether in his longitudinal analyses of 
college impact* 

An even subtler and potentially more serious consequence of measure 
ment error -- one that arises not only in matching designs but also in 
designs that use correlational methods — is attenuation in the observed 
correlation between input and output variables . It is a well-known 
statistical fact that error of measurement in either of a pair of 
correlated variables lowers their observed correlation. To the extent 
that such attenuation results from error in the input variables, a 
serious bias is introduced into the analysis if the input variable is 
also correlated with environmental variables . 

The way in which this bias operates can be indicated by means of 
a hypothetical example. Assume that we are interested in determining 
how the student's achievement is affected by the ’’quality" of his 
college. Furthermore, we have longitudinal data on students attending 
a variety of colleges of differing degrees of quality. Our output 
measure of achievement is the student's composite performance on the 
Graduate Record Examination (GRE) , and our environmental measure of 
quality is the percentage of faculty members holding Ph.D.'s. For the 
sake of simplicity, let us assume that there is only one relevant input 
measure: the student's composite score on the National Merit Scholar- 

ship Qualifying Test (NMSQT) . To complete our hypothetical picture, 
we can make the following additional assumptions: (a) The NMSQT was 

not used in making admissions decisions; (b) The NMSQT is positively 
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correlated with the GRE; and (c) NMSQT scores are positively correlated 
with college quality. This last assumption states simply that bright 
students are more likely to attend relatively high-quality colleges. 

Our analyses of these three measures might involve simply com- 
puting the partial correlation between college quality and GRE perfor- 
mance, holding constant the effects of NMSQT scores. Or perhaps we 
might want to compute residual GRE scores (regressed on NMSQT scores) 
and plot the mean residuals for each college against the college quality 
measure (this latter type of analysis would permit us to see any non- 
linear effects of quality and also to identify individual institutions 
that might have very large mean residuals) . No matter which approach 
is used, however, we are likely to find that college quality has a 
positive "effect" on achievement, even if there is in fact no effect . 

The reason for this is that error of measurement in the NMSQT causes us 
to underestimate the correlations of NMSQT with college quality and 
GRE scores, and thereby to "undercorrect" for initial NMSQT performance. 
Thus, even though we have statistically equated the student bodies 
entering each college in terms of their mean observed NMSQT scores, we 
have not equated them in terms of their mean true NMSQT scores: Since 
the adjusted "true" score is still positively correlated both with 
college quality and GRE scores, we should still expect to find a posi- 
tive correlation between quality and GRE performance. 

These artifacts can perhaps be better illustrated in terms of 

regression analysis. In simple linear regression, the slope of the 

regression line (regression coefficient) is a direct function of the 

correlation coefficient: b = r y (b and r, of course, will be 
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identical if the two variances are equated). Thus, if r is attenuated 
by error of measurement in x (NMSQT scores), then b will also be 
attenuated. The net effect of error of measurement in our example, 
then, is to flatten the slope of the regression of GRE on NMSQT scores . 
Figure 3 shows the consequences of this phenomenon: If one flattens the 




Observed Regression (error of measurement in NMSQT) 

True Regression (no error of measurement in NMSQT) 

Figure 3. True and Observed Regression of GRE Scores on NMSQT Scores. 

slope of the regression line (dotted line), he will tend to underestimate 

GRE for high values of NMSQT, and to overestimate GRE for low values of 

NMSQT. Therefore, if we attempt, statistically, to equate students entering 

different colleges by partialling out the effects of their initial ability 

(NMSQT scores) on GRE performance, the residuals for students with above- 

average NMSQT scores will be too large, and those for be low-average 

students too small. Similarly, the mean residuals for students attending 

high-quality colleges will be spuriously large (because there are more 

high-ability than low-ability students at these institutions) and the mean 

residuals for students attending low-quality colleges will be spuriously 

small. The magnitude of these spuriously large residuals is a direct 
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function of the amount of measurement error in the NMSQT. 

The same problem occurs, of course, if ve compute residual college 
"quality" scores: The residual environmental scores for students at 

high-quality colleges are spuriously high, and those for students at 
low-quality colleges spuriously low. In short, the net result of error 
of measurement in the NMSQT is that we undercorrect for initial differ- 
ences in ability, thereby creating a spurious positive partial or part 
correlation between college quality and GRE performance. 

A real danger in spurious "effects" of this sort is that they are 
likely to be believed by educators and policy-makers because they con- 
firm existing theories about college impact. It is widely held, for 
example, that students get a "better education" in the "good" colleges. 
Indeed, this is one of the beliefs that attracts the brightest students 
to such colleges. Moreover, since most highly selective colleges 
manifest other traditional signs of prestige or quality (large libraries, 
distinguished faculty, competitive atmosphere, and so forth), the 
expectation that the student's intellectual development will prosper 
more in a high-quality college than in one of low quality is reinforced. 

The same "believability" of results may be created by errors of 
measurement in many other types of input variables. Thus, we would 
expect students to become relatively more "liberal" if they attend 
colleges where the students are already highly liberal. We would expect 
a student with strong science interests to maintain these interests 
during college if he attend an institution where a high proportion of 
his fellow students also have strong science interests. The notion 
that students tend to change, in the direction of their fellow students' 
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dorainant characteristics has been stated by Astin (1965a) and Astin 
and Panos (1969) as the theory of "progressive conformity" and by 
Feldman and Newcomb (1969) as the theory of "accentuation of initial 
differences." The point here is simply that, even if such theories 
are wrong, they will tend to be confirmed by virtue of error of measure- 
ment in the student input variables. 

The fundamental importance of measurement error bias in studies of 
college impact indicates that some statistical proof of the bias be 
presented. For this purpose we can use the example of the study of 
the effects of college selectivity on GRE performance. The three 
basic variables can be designated with subscripts as follows; 

0 * GRE score 

1 - NMSQT score 

2 * College Quality 

Let us begin by assuming that (a) the correlations among these 
three variables are all nonzero and positive (which happens to be the 
case), but (b) there is no true effect of College Quality on GRE per- 
formance. In other words, let us assume that if it were possible to 
obtain error- free measurements of all three variables, the partial 
correlation between GRE and College Quality, holding constant the effects 
of NMSQT scores, would be zero. Using the familiar formula for a 
first-order partial correlation coefficient, the true partial correla- 
tion between quality and GRE performance would thus be; 

(1) (true scores) r 0 2*l = —-22 ■— — * °* 

*v/(l-r^ 3 )(l-r^) 

Since the denominator of this formula is always nonzero and positive 
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(except in the limiting case of a zero-order correlation of 1.0 between 
two of the variables), the numerator must be zero to satisfy the re- 
quirement that the partial coefficient be zero. Thus, our hypothetical 
situation of no true effect of college selectivity requires that: 

(2) (true scores) r Q2 “ r oi r 12 = °* 

Our hypothesis about the biasing effects of measurement error is 
that the partial correlation based on the observed scores of these 
three variables will be non zero and positive: 

(3) (observed scores) r^ 2 " r oi r 12 ^ 

Our proof of (3) will consist simply of showing what happens when 
error of measurement is introduced into (2) . First we must be able to 
estimate the correlations among true scores as shown in (2) . In order 
to do this we need to know what the reliabilities of the three variables 
(r r.., and r 99 ) are. For purposes of discussion we shall assume 
that all three reliabilities are imperfect but nonzero: 



Since the true correlation between two variables is equal to the 
ratio between their observed correlation and the geometric mean of 
their reliabilities, formula (2) can now be expressed in terms of 
correlations among observed scores as follows: 



(4) 0. < r Q0 < 1. 

(5) 0. < r u < 1. 

(6) 0. < r 22 < 1. 




or. 




Note that, in order for the left side of the equation to equal 
aero, the two terms within the parenthesis must be an equality, i.e.s 

(9) r Q2 - r 01 r 12 

r U 

Now if we compute the partial correlation between GRE and College 
Quality using observed scores but make no correction for unreliability 
in our measures, only the right-side of (9) is affected, i.e., we omit 
the correction for unreliability and the denominator becomes 1.0. (Making 
no correction for unreliability is equivalent to assuming that there are 
no measurement errors and that the reliability of the variable is 1.0.) 
Since r^ <1.0 from (5): 

( 1°) r 01 r 12 <_l0lll2_ 

r n 

and 

(11) r Q2 > r Q1 r 12 > 0. 

Consequently, when no correction is made, the parenthetical term 
in (8) becomes nonzero and positive. Since the two reliabilitie s ’ r 00 
and * 22 ' are a * so nonS5e *o and positive, the entire left side of equation 
(8) and, hence, the observed partial correlation between GRE and college 
selectivity becomes nonzero and positive. In short, failure to adjust 
for error of measurement in the NMSQT will lead to the conclusion that 
achievement is favorably affected by college quality when there is no 
true effect . 

Several additional interesting conclusions can be deduced from 
formula (8) : 

1. The principal source of bias results from measurement error in 
the partial led variable (variable #1 in equation 8) . Failure 
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to adjust for such bias can: 

a. Create spurious "effects" when there are no true effects. 

b. Exaggerate the magnitude of true effects. 

% c. Reverse the sign of the true effect. 

2. The direction (sign) of the bias resulting from errors of 
measurement in the partial led variable depends on the signs 
of its correlations with the other variables (r^ and r^) • 

3. Failure to adjust for measurement error in either the inde- 
pendent or dependent variable will only attenuate the observed 
partial correlation between the independent and dependent 
variable. 

Several remedies can be employed to compensate for error of measure- 
ment in the input variables when multivariate techniques are used to 
evaluate college impact. A remedy for the two variable case (input 
and output) has been proposed by Tucker, Damerin, and Messick (1966). 

An appropriate generalization of their approach to the multivariate 
case would be to compute correlation matrices using the variance in the 
"true" (rather than observed) scores of all independent variables (stu- 
dent input as well as environmental). Since the true variance in a set 
of scores is simply the product of the observed variance and the relia- 
bility of the measure, it would be relatively simple to make such 
corrections if the reliabilities of all of the input and environmental 
variables were known. Estimates of reliability are usually available 
for psychometric devices, though not for the demographic and other 
types of questionnaire data that characterize so much of the research 
on college impact. Recognizing the need for this information, the 
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research sta££ of ACE is currently engaged in an extensive empirical 
study to obtain estimates of error of measurement in questionnaire data 
(Boruch and Creager, 1970). 

Environmental Measurement 

In the early days of research on college impact, investigators 
were primarily concerned with the significance of "the college experience" 
in a generic sense. That is, they were little interested in differences 
among different types of colleges and, by implication, they dealt with 
how "going to college" compares with "not going to college." However, 
with the steadily growing proportions of young people who go on to some 
form of higher education and with many city and state systems moving 
toward open admissions, the question of the impact of college (versus 
no college) is becoming more an academic than a practical question. In 
short, with the greatly expanding higher educational opportunities and 
the extraordinary diversity among institutions, the question of the 
impact of college is increasingly coming to be one of the comparative 
impact of different types of college experiences. 

Environmental measures in comparative studies of college impact 
function chiefly to provide a basis for interpreting any observed dif- 
ferential effects. The simplest form of environmental measurement is 
simply to compare the effects of one college with the effects of another. 
The environmental "measure" in such studies is merely a dichotomy: 
college A versus college B. While such studies may prove interesting 
to the persons immediately concerned with the institutions being com- 
pared, the crudeness of this environmental measurement greatly limits 
the generalizability of the findings beyond the two institutions. To 
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illustrate this problem, let us assume that we are interested in comparing 
the differential impact of a given state university and a given private 
liberal arts college on the development of students. We find that the 
liberal arts college, relative to the state university, increases the 
student's aspirations to go on to graduate school after completing the 
baccalaureate degree. (For purposes of discussion, we shall assume 
that this is indeed a differential effect of the two colleges and not 
an artifact arising from our failure to control adequately differential 
inputs to the two institutions.) How can we explain this observed 
difference in the relative impact of the two institutions? Is it be- 
cause their faculties differ in the encouragement they give to students 
to go on for advanced training, or is it because the stimulation pro- 
vided by the student peer groups differs? Cun the result be caused by 
more subtle institutional difference in living conditions, type of 
college town, or administrative practices? Perhaps the relative 
neglect of undergraduate instruction at the state university makes the 
student cynical about the importance of graduate education. Clearly, 

the observed differential effect of the two institutions is subject to 

, 11 

a variety of interpretations. 

In this example, then, merely having available a large number of 
environmental measures on each of the two colleges does not resolve 
the interpretive dilemma, since there is no way to determine which of 
the various environmental attributes was responsible for the observed 
difference in institutional impact. Lacking any empirical way of 
choosing among the various measures, the investigator is forced to 
rely on a purely clinical or intuitive explanation of the observed effect. 
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The most obvious solution to this dilemma is to study simultaneously 
a much larger number of colleges with widely differing environmental 
characteristics. Ideally, the number would be large enough to permit 
reasonably reliable correlations using the institution as the unit of 
analysis. The relative contributions of the various environmental charac- 
teristics to the prediction of differential institutional impact would 
serve as an empirical basis for determining which environmental attri- 
butes are causally related to the student outputs under study. 

Environmental measures are of two basic types: (1) The characteris- 

tics of the total institution (its size, selectivity, permissiveness, 
etc.) which can, in theory at least, affect all students at the institu- 
tion and (2) special educational experiences within the college (living 
in a particular dormitory, having a particular roommate, participating 
in an honors program, etc.) to which all students at a given institution 
are not exposed. This latter category comprises within- col lege environ- 
mental variables, whereas the former comprises be tween - col lege environ- 
mental variables. 

Between-College Measures 

Several instruments have been devised for measuring characteristics 
of college environments. In many ways, these instruments resemble per- 
sonality inventories designed for assessing the traits of individuals; 
they include the College Characteristics Index (CCI) (Pace and Stern, 1958; 
Stern, 1963); the College and University Environmental Scales (CUES) 

(Pace, 1960, 1963); the Environmental Assessment Technique (EAT) (Astin 
and Holland, 1961); and the Inventory of College Activities (ICA) (Astin, 
1968a) . Reviews and discussions of these instruments have appeared 
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elsewhere (Astin, 1968a; Menne, 1967). Each presents certain problems 
in Inferring causation that merit some discussion. 

These various instruments enbody three conceptually different 
approaches to the assessment of environmental characteristics: The 

"image" approach of the CC1 and the CUES, the "student characteristics" 
approach of the EAT, and the "stimulus" approach of the ICA. 

In the image approach, observers (usually students) are asked to 
report their impressions of what the college is like. The answers of 
all respondents at a particular institution are aggregated or averaged 
for each item, and the items are grouped into scales to form the en- 
vironmental measures. Although the CCI and CUES use between 15 and 30 
items per scale, data from the ICA (Astin, 1968a) indicate that highly 
reliable estimates of college "image" factors can be obtained with only 
two or three items. Apparently, when scale scores are averaged or 
aggregated across relatively large numbers of individuals, the advan- 
tages to be gained from basing each scale on a large number of items 
diminish. 

The students characteristics approach is based essentially on an 
interpersonal theory of environmental influence. The objective is thus 
to assess the average or modal characteristics of the students at each 
institution. Although the EAT is based on only eight measures of the 
student body (sis^, ability, and six measures of personality) , the 
possible number of relevant student characteristics is actually much 
larger . 

The stimulus approach to measuring college environments was developed 
primarily because of certain interpretative difficulties connected with 
the image and student characteristic approaches. In the stimulus 
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approach, the environment is seen as consisting of all of the stimuli 
that are capable of changing the student's sensory input. A "stimulus 
is defined as "any behavior, event, or other observable characteristic 
of the institution capable of changing the student's sensory input, 
the existence or occurrance of which can be confirmed by independent 
observation" (Astin, 1968a, p. 5). This definition suggests that 
neither the college image nor the personal characteristics of the 
students satisfies the criterion of a potential stimulus. Thus, al- 
though the student’s perception of his environment may influence his 
behavior toward his fellow students, his perception alone cannot 
function as a stimulus for others. Similarly, the student s intel- 
ligence, attitudes, values, and other personal characteristics do not 
constitute stimuli by this definition, even though such traits may be 
manifested in certain behaviors which can in turn can serve as stimuli 
for fellow students. The stimulus approach was thus developed in the 
belief that environmental measures based on such information would 
provide a better conceptual basis for interpreting causal relationships 
than either the image or student characteristics approaches. 

A major difficulty presented by the image approach to measuring 
environmental characteristics is that the student's perception of his 
college can be influenced not only by what the college is really like 
but also by how it has influenced him. Thus, if a particular image 
factor is found to "affect" some student outcome, we can not be sure 
that we have adequately explained the observed effect, simply because 
the student's perceptions may have been influenced by the effect itself 
This interpretative problem is well illustrated in a recent empirical 
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study of differential college effects on the student s tendency to 
drop out of college (Astin and Panos, 1969). The environmental variable 
involved was Concern for the Individual Student, one of the eight image 
factors from the ICA (Astin, 1968a). It is defined by three image items: 
the percentage of students who rate the college environment as "warm 
(scored positively), the percentage of students who agree with the 
statement that "most students are more like 'numbers in a book 
(scored negatively), and the percentage of students who agree with the 
statement, "I felt 'lost' when I first came to the campus" (scored 
negatively). This environmental measure correlated .38 with the mean 
residual (from input variables) percentage of students who stayed in 
college during the four years following matriculation. Although one 
might be tempted to interpret this finding in a direct causal sense 
(that is, it seems plausible that students are more likely to retaain 
in college if their institution shows concern for them), it is also 
possible that the institution's score on this measure is the result 
rather than the cause of the student's persistence at an institution. 
That is, in responding to these college image questions, students who 
have already dropped out, or are about to, may be more inclined to 
report that their institution shows little concern for them than stu- 
dents who have made up their minds to stay in college. Of course, if 
the image factor has higher correlations with the outcome criterion than 
does any other environmental factor (which happens to be true in this 
particular case), then the investigator can more safely infer that the 
environmental measure is an antecedent rather than a consequence of the 
output variable under investigation. 
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A fourth category of between- institution environmental information - 
one which has not as yet been used in developing an inventory to 
measure college characteristics -- is the structural and organizational 
characteristics of the institution. There are many such measures that 
might be used, both qualitative and quantitative. Among the more 
common qualitative measures are type of control, religious affiliation, 
type of curriculum, highest degree conferred, geographic region, sex, 
and race of the institution. Among the many possible quantitative 
measures are size of the institution, faculty- student ratio, tuition 
charges, endowment, operating budget, research funds, percentage of 
Ph.D.'s on the faculty, and library size. Such characteristics pre- 
sent certain interpretative difficulties because they are remote from 
the student and his development. From a practical standpoint, however, 
they are of particular importance, being more amenable to direct manipu- 
lation than are most of the measures that characterize the various envir- 
onmental inventories. This fact suggests that we badly need to do 
research on the manner in which these structural administrative charac- 
teristics affect the college environment, and in turn, the development 
of the student. Creager and Sell (1969) have taken a step in this 
direction by developing a master institutional file, which contains 
measures of many structural and administrative attributes for all 
colleges in the population. 

In one recent multi- institution longitudinal study, Astin and Panos 
(1969) compared image, personal characteristics, and stimulus measures 
in terms of their effectiveness in accounting for differential institu- 
tional impact on the undergraduate student's educational and career 
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Within-College Measures 

Some educators object to the use of be tween- col lege measures, such 
as those discussed in the preceding section, on the grounds that a 
measure of the environment of the total institution may be a poor reflec- 
tion of the environment actually encountered by individual students. 

Since there are unquestionably many distinct subenvironments within 
given institutions, especially within the complex universities, measures 
of the "total" institutional environment will confound these distinctions. 
One practical problem in using between- institution measures to describe 
subenvironments within institutions is to define the appropriate envir- 
onmental subunits. This task may be relatively simple in many univer- 
sities, where colleges or schools are well-defined, although the mere 
existence of such colleges or schools does not necessarily mean that 
they are functionally independent. In some universities, for example, 
the students attending the technical . .liege have little or no contact 
with students or professors in any of the other colleges. On the other 
hand, in other universities, such students may live in dormitories and 
attend classes with students from a variety of other colleges. Never- 
theless, if functionally independent subunits within institutions can 
be identified, there seems to be no reason why these units cannot be 
treated as separate "institutions" in the analysis of between-college 
environmental effects. 

There are a great many within-college environmental experiences that 
cut across organizational subunits like colleges of schools. The metho- 
dological challenge to the researcher is to identify such experiences, 
devise an appropriate means for measuring them, and determine whether or 
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no t each of his subjects encountered these experiences during college. 

Below is just a partial list of the many types of within-college environ- 
mental experiences which can affect the student's development: 

1. The characteristics of individual professors and individual 
courses . 

2. His course of study. 

3. The amount of time he spends at various activities (studying, 
outside reading, recreation, sleeping, etc.). 

4. The type and amount of counseling or advisement he receives. 

5. His participation in special educational programs (honors 
program, year abroad, Washington semester, undergraduate research 
participation, etc.). 

6. His living arrangements (dormitory, fraternity house, commuting 
from home, private apartment). 

12 

7. Number and types of his roommates. 

8. His use of drugs (tranquilizers, barbiturates, hallucinogens, 
narcotics, etc.). 

9. The type and amount of financial aid he receives. 

10. The hours he works and the type of work he does. 

11. His marital status and number of children. 

12. The availability to him of a private automobile. 

Information about most of these within-environment experiences can 

be obtained directly from the student by means of a follow-up questionnaire. 
However, self-reports present certain potential dangers, depending upon 
the nature of the experience. Those experiences that require relatively 
little interpretation, such as whether the student had a scholarship, 
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seem to present few problems; however, reports of how many hours the 
student spent in outside reading, or ratings of his professors of 
roommates, may be systematically biased, thereby producing interpretive 
ambiguities similar to those described earlier for measures of the col- 
lege "image." In theory, of course, no student’s report of his own 
environmental experiences can be regarded as experimentally independent 
of data on his input and output characteristics. In practice, however, 
the bias resulting from this lack of independence is probably minimal, 
as long as the environmental experience being reported is relatively 
objective and not open to misinterpretation by the student. 

A related problem concerns the "landomness" of the experience 
itself. The basic problem here is to determine the extent to which the 
student himself was directly responsible for his being exposed to the 
particular experience, or, in cases where exposure was determined by 
others, the extent to which their decision was based on a knowledge 
of the student's input characteristics. In some colleges, for example, 
the assignment of students to dormitories, classes, professors, and 
other experiences is virtually random or at least haphazard. In others, 
the student has almost complete control over where he lives, who his 
roommates are, and what courses he takes. The most difficult interpre- 
tative problems arise in the case of experiences over which the student 
has very direct control (drug use, for example). 

This discussion of environmental measurement indicates that a step- 
wise analysis of college impact would successively control the effects 
of various independent variables on the output variable in the following 
logical sequence: 
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1. Main effects of student input variables. 

2. Interaction effects among student input variables. 

3. Main effects of within- col lege environmental variables. 

4. Interaction effects among within -college variables. 

5. Interaction effects between student input and within - col lege 
environmental variables. 

6. Main effects of be tween -college environmental variables. 

7. Interaction effects among be tween - college environmental variables. 

8. Interaction effects between student input and be tween - col lege 
environmental variables. 

9. Interaction effects between within- col lege and be tween -col lege 
environmental variables. 

Of course, the higher-order interaction effects among student input, 
within- col lege and between- col lege variables can also be studied, al- 
though the possible number is so large that the investigator would ordi- 
narily limit his research to those higher-order effects that test 
specific hypotheses. 

Methods of Data Collection 

One methodological question which has received very little attention 
is the technique used for collecting empirical data in studies of college 
effects. Unfortunately, logistical considerations often limit the 
techniques available to an investigator. One difficulty in multi- 
institution studies, for example, is that the conditions for collecting 
data may vary systematically from institution to institution, particularly 
if the tests or questionnaires are administered to groups of students. 
Important institutional biases may be introduced by variations in the 
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instructions given, in the manner in which students' questions are answered, 
in the time alloted for completing the task, and in the physical surroundings 
where the task is carried out. The major problem with such biases is, of 
course, that they affect most of the students at a given institution and 
thereby introduce systematic error into the interinstitutional comparisons. 

Serious systematic errors of this kind are probably more likely to 
occur with group follow-ups or post-testing than with initial group 
pretesting, since the general environmental conditions associated with 
freshman orientation and registration (assuming that this is the period 
during which the pretest is administered) are probably much the same at most 
institutions. Moreover, students are likely to be more cooperative about 
providing information and completing forms at this time than at any 
other. In addition, the persons who administer the forms and the other 
students who are completing them are likely to be anonymous individuals 
to the new students, wherever he enrolls. In the case of follow-up or 
post-testing, however, the stiuation is usually very different. Stu- 
dents may have to be assembled in some ad hoc fashion to complete the 
task, thus introducing biases with respect to the time and place of 
testing. Some students who are well along into their senior year may 
strongly resent being asked to spend time on such a task. If the testing 
is carried out in existing classes, important biases may be introduced 
either because the professor resents the intrusion on his class time 
or because of the classroom interaction that has already developed during 
the term. 

The most serious problem with group follow-up testing at the insti- 
tution is that it excludes dropouts and early graduates. As a consequence. 
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generalizations concerning environmental influences are limited to what 
may be the least affected group of students. A more subtle problem is 
the effects of college on the student's tendency to drop out may be 
confounded with college "effects" on other outcomes. If cjropout- 
p rone ness is correlated with changes in other student outcomes, then 
those institutions that encourage potential dropouts to stay in college 
will also appear to "affect" these other outcomes. One protection for 
the researcher here would be to see if these other "effects" hold up 
once he has controlled for the institution's dropout rate. In other 
words, he might use the institution's dropout rate as a kind of input 
or control variable. 

Perhaps the most realistic alternative to group follow-up testing 
at the institution is the mailed questionnaire sent to the student's 
home. This technique permits the investigator to follow-up all or a 
random sample of all entering students, including those who have dropped 
out or transferred. In one sense, the self-administered questionnaire 
represents the most un standardized of all data collection techniques. 

The real methodological advantage here, of course, is that these ex- 
treme variations in the conditions of administration are confounded 
in the interinstitutional comparisons. The reduction in precision that 
results from this confounding is a small price to pay in order to elimi- 
nate the systematic biases that almost inevitably result from follow- 
ups carried out at the institution. It should be pointed out, however, 
that mailed questionnaires cannot be used for follow-up assessments that 
require proctoring (e.g., achievement testing). 



The principal methodological limitation of the mailed questionnaire 
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is that not all students will cooperate, and that those who do complete 
and return the questionnaire are likely to be a biased subsample of the 
total group to whom questionnaires are sent. Several solutions to 
this problem are possible. In the first place, it should be recognized 
that nonrespondent bias is likely to have a greater effect on marginal 
tabulations of data than on associational measures. In fact, some 
evidence (Astin, 1968a, Appendix B) suggests that interinstitutional 
intercorrelations of questionnaire items are virtually unaffected by 
nonrespondent bias. Whatever biases do exist^ however, can be compensated 
for in certain ways. If pretest or input data are available on all 
subjects (as is usually the case), they can be utilized to develop 
compensatory weights to be applied to the respondents' data. The basic 
idea here is to give relatively more weight to those respondents who 
most closely resemble the nonrespondents in their input characteristics. 
The weights can be developed from actuarial tables (Astin and Panos, 

1969) or from regression analyses (Astin, 1970). The major objective 
of the regression analysis is to produce a set of weights which, when 
applied to the input data of the respondents, yields marginal tabula- 
tions on all items which are identical to the original marginals based 
on all subjects (respondents and nonrespondents). While there is no 
guarantee that such differential weighting will compensate for all 
significant nonrespondent biases, it will unquestionably eliminate or 
at least reduce some of the bias. 

A more familiar technique for dealing with nonrespondent bias is 
to conduct additional, more intensive, follow-ups of subsamples of 
nonrespondents by means of special delivery or registered mail, telegrams. 
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telephone calls, or other methods. An examination of the data thus 
secured can provide estimates of the possible effects of nonrespondent 
bias in the larger sample. If the bias proves to be serious enough, 
the same intensive follow-up procedures should be applied to the total 
sample. This method can also be used to assess the impact of the dif- 
ferential weighting described above. If the differential weights tend 
to make the follow-up marginals more consistent with the data as revealed 
by the intensive follow-up, then it can be assumed that the weighting 
procedure is working. In fact, the weighted marginals may be even 
more valid than the data from an intensive follow-up in which responses 
are obtained from, say, 85 percent of the subsample. The investigator 
would have reason to believe that such was the case if the weighted and 
unweighted marginals tended to straddle the corresponding marginals 
from his intensive follow-up subsample. Whether the weighted marginals 
were plausible would, of course, have to be assessed in light of the 
likely (or even possible) change in the values of the marginals that 
would occur if a 100 percent response could be obtained. 

There have been very few empirical studies of the effects of com- 
pensatory weights on data. As was mentioned earlier, such weighting is 
probably important, if not essential, in reporting the marginal tabu- 
lations of respondent data. It has been shown, for example, that those 
students who do respond to follow-ups, in comparison with those who do 
not, are brighter, achieve at a higher level, are more motivated, and 
have more highly educated parents (Astin and Panos, 1969; Astin, 1970). 
Consequently, unweighted follow-up marginals are almost sure to be biased 
with respect to any item having to do with either ability or SES. (These 
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findings also suggest that investigators who intend to use mail follow- 
up techniques should routinely plan to collect student input information 
on ability, past achievement, and parental education.) Very little is 
known, however, about the effects of weighting on associational measures. 
Because of our current ignorance on this matter, it would seem that, 
at a minimum, investigators should consider repeating some of their 
causal analyses using both weighted as well as unweighted follow-up 
data. 

Although the systematic errors that result from group administration 
of instruments may seriously bias the analyses of comparative institutional 
impact, the investigator has at his disposal several techniques for 
detecting such biases. Obviously, he must make sure that the procedure 
used by the institution is thoroughly documented. When he finds that 
certain individual institutions show especially pronounced "effects" 
that cannot be accounted for by measurable environmental attributes, 
he should suspect that biases are operating. (To explore this question, 
he would need a large number of institutions and a comprehensive set of 
environmental measures.) The investigator will also have reason to 
suppose that systematic biases are present if the institution in ques- 
tion shows peculiarly large "effects" on highly subjective or judgmental 
outcomes. Presumably, the student's report of relatively factual out- 
comes, such as his final field of study of his marital status, is not 
as likely to be affected by situational factors as is his report of, say, 
his personal values or attitudes. In particular, the students' ratings 
or subjective judgments of his college are probably most sensitive to 
variations in the institutional circumstances under which these judgments 
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are obtained. Routine inclusion of a few such items in every group- 
administered questionnaire may serve as a good check on the possibility 
of systematic biases. 

Resolving Some Inferential Dilemmas 

No matter how elegant his design, the investigator can never be 
absolutely sure that he has isolated "true" college effects. But there 
are certain situations in which he can have a good deal of confidence 
that his data are indeed revealing environmental influences. 

One situation which justifies a high degree of confidence is that 
in which the environmental variable is uncorrelated with the input var - 
iables . For example, one college environmental characteristic that has 
no relationship, or only a very low one, with most student input variables 
is institutional size (Astin, 1965b). In other words, students who go 
to large institutions differ very little from those who go to small ones. 
Consequently, the observed effect of size on some student output measure 
is almost certainly not just an artifact of the researcher's failure 
to control input differences. 

A related situation arises when the correlation between environment 
and output is substantially higher than the correlation between environ - 
ment and input . Ideally, one would like to see the correlation between 
environment and output increase as differential inputs are controlled. 

(The more usual situation, of course, is that the environment-output rela- 
tionship shrinks consistently as input variables are successively controlled.) 
Thus, if the environment-output correlation increases, or at least holds 
its own, as input variables are controlled, the investigator can be 
reasonably confident that his observed environmental effect is a true 
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one. However, if the correlation between environment and output diminishes 
consistently as input variables are controlled until only a small rela- 
tionship remains, there is a very real possibility that if one or two 
other input variables had been included in the analyses, the relation- 
ship would have disappeared altogether. 

Perhaps the strongest evidence for true causation exists when the 
director (sign) of a particular environmental effect is the opposite 
of the zero-order correlation between that environmental variable and the 
output measure. Although rare, this reversal in sign has been observed 
in at least one study (Astin and Panos, 1969). In this study. Cohesive- 
ness, an ICA environmental measure reflecting primarily the proportion 
of students who report having many close friends among their fellow 
students, was shown to have a positive effect on the student's chances 
of staying in college. The zero-order correlation between Cohesiveness 
and the percentage of students remaining in college for four years, 
however, was negative (r ° -.13). When differential student input var- 
iables were controlled, this partial correlation reversed sign (to 
+.25). The explanation of this apparent paradox is that students who 
go to highly cohesive institutions, are, on the average, more dropout- 
prone than are the students who go to the less cohesive institutions. 
Consequently, the positive relationship between the dropout- proneness 
of entering freshman classes and the Cohesiveness of institutional 
environments masks the negative effect of Cohesiveness on the individual 
student's chances of dropping out. 

Research data that reveal significant interaction effects represent 
another situation where causal inferences can be made with more than 
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usual confidence. Consider, for example, a situation where a measure 
of, say, the peer environment is found to have a significant main 
effect on some outcome. If this apparent effect is indeed a true one 

rather than an artifact, then we would expect to find that the effect 

is stronger among resident students — who would have more contact 
with their fellow students and thus would be more affected by their 
characteristics -- than among commuters. Therefore, if it can be 
shown that there is a significant interaction effect involving resident- 
commuter status and the particular peer environmental measure in question, 
then the conclusion that the environmental attribute is causally related 
to the outcome is strenghtened . By the same reasoning, we should not 

expect to find such interactions with measures of, say, the classroom 

environment, since both residents and commuters presumably have equal 
exposure to such environmental factors by virtue of attending classes. 

Similar checks on the validity of causal inferences can be made 
by examining many other types of interaction effects. For example, 
extroverts or gregarious students are presumably more susceptible to 
the effects of peer factors than are introverted or shy students. To 
take another example, the magnitude of a particular effect should in- 
crease the longer the student is at the college. The point is simply 
that, for many of the apparent environmental effects that may be ob- 
served in longitudinal studies, it is possible to hypothesize the 
existence of certain interaction effects which, if subsequently con- 
firmed by additional analyses, would lend support to the assumption 
that the relationship is indeed a causal one. 
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Summary 

The purpose of this paper has been to review some of the major 
methodological problems in the design of studies of college impact. 

To facilitate the discussion of design problems, one may view the 
question of college impact in terms of three components: student 

outputs, student inputs, and environmental characteristics. Any prob- 
lem in the design of college impact studies can be seen in terms of 
tb<£ relationships between these three components. 

The major goal of college impact studies is to minimize three 
kinds of inferential error: type I and type II errors (the traditional 

inferential errors of experimental design), and type III errors, which 
are defined as inferential statements which simultaneously involve 
both type I and type II errors. Type ill errors are possible in college 
Impact research primarily because of the highly nonrandom distribution 
of students among institutions. 

Much of the previous research on college impact has resulted in 
ambiguous findings primarily because at least one of the three informa- 
tional components was missing. The single- institution study, through 
input and output information, indicates how the student changes during 
college, but it provides no information bearing directly on environmental 
impact. The multi- institution cross-sectional study provides informa- 
tion on the relationship between environments and outputs, but it is 
highly suBceptiblj to type I and type III errors unless student input 
data are also collected. 

The most definitive information about college impact is obtained 
from mul ".i- institution longitudinal studies in which data on student 
inputs, student outputs, and environmental characteristics are obtained. 
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Such data can be analysed by a variety of "quasi- experimental" designs 
(path analysis, for example), although step-vise linear multiple re- 
gression analysis is perhaps the most flexible and versatile method, 
particularly if the regression is carried out in separate "stages" 
dictated by the logic of the college impact process. 

The variance in student output in the multi- institution studies 

can be assigned to four sources: error, variance uniquely attributable 

to input variables, variance uniquely attributable to environmental * * 

variables, and confounded variance. Variance uniquely attributable 

to input variables can be defined as the squared multiple part correlation 

between the output measure and the residual input measure. Variance 

uniquely attributable to environmental sources can be defined as the 

squared multiple part correlation between the output measure and the 

residual environmental measures. Confounded variance is defined as 

the remainder of the total predictable output variance (that is, the 
2 

final R minus the two squared part correlations) • The two part 
correlations can be used as "lower- bounds" estimates of the total out- 
put variance attributable to a particular source, whereas the part 
correlation plus the confounded variance can be used as an "upper-bounds" 
estimate. Additional part correlations and confounded variance estimates 
can be obtained for interaction effects, if desired. 

One of the most serious sources of potential bias in college effects 
studies, regardless of the method of analysis used, are errors of mea- 
surement in the input variables. Unless corrections are made for such 
errors, the investigator runs the risk of finding spurious college 
"effects." Such spurious effects tend to be highly believable, in that 
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they ordinarily support the most plausible theory of how students are 
affected by their colleges. Since appropriate adjustments for such 
measurement error require a knowledge of the reliability of each in- 
put variable, researchers engaged in studies of college impact should 
consider routinely collecting such reliability information on each 
instrument that they use. 

The generalisability and usefulness of information about college 
impact depends very highly on the number and kinds of the environmental 
measurements used. The principal function of environmental measurement 
in research on college impact is to provide an interpretive frame of 
reference for any significant effects that might be observed. Although 
the most popular approaches to environmental measurement have been 
based on student perceptions of the environment, such measures present 
interpretive difficulties in view of the possibility that the effect 
itself may have influenced the student's perception of his institution. 
One possible solution to this problem is to develop environmental 
measures baaed on directly observable events rather than on perceptions. 

In spite of the many methodological and logical problems inherent 
in research on college impact, several checks and precautions are 
available to the investigator that will reduce his chances of commiting 
inferential errors. 
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Footnotes 



* * 



TThis study was supported by grants from the National Institute of Mental 
Health and the National Science Foundation, and by general funds from the 
American Council on Education. The author is indebted to Alan E. Bayer, 
Robert F# Boruch, and David E. Drew for their criticisms of an earlier 
draft of this paper, and especially to John A. Creager for his suggestions 
concerning the section on measurement error. Portions of this paper were 
presented at the Annual Meeting of the American Educational Research 
Association, Minneapolis, 1970. 

\his model, which was originally presented in Astin (1965a), has been 
adapted for a program of research in higher education (Astin and Panos, 
1966) and for a more general model of educational evaluation (Astin and 
Panos, 1970). 

^In statistical terms, "opposite" means that there has been an error of 
direction or sign in ^ejecting a two-tailed null hypothesis. A similar 
idea has been proposed earlier by Kaiser (1960) . 

4 See Astin, Panos, and Creager , (*966) ; Panos, Astin and Creager (1967); 
Creager, Astin, Boruch, and Bayer (1968); and Creager, Astin, Bayer, 

Boruch, and Drew (1969). 

^Part correlations, which are not described in most textbooks on statistics 
may be unfamiliar to some readers. Simple sero-order correlations involve 
two unadjusted variables. Partial correlations involve two residual var- 
iables (the residuals having been calculated from a third variable or 
set of variables). Part correlations involve one unadjusted variable and 
one residual variable. 

^Actuarial analysis also bears certain similarities to analysis of variance 
A major difference, of course, is that the cells (treatments) are formed 
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in actuarial analysis on the basis of a knowledge of the dependent (out- 
put) variable, rather than a priori. 

^The first method, where student inputs are controlled using the s tuden t 
as the unit of analysis, would not entirely partial out environmental 
effects under these circumstances, since the within-college regressions 
would not fall along the common or pooled regression line. Thus, the 
residuals in colleges with initially high mean input scores would tend 
to be positive, whereas the residuals in colleges with relatively low 
mean input scores would tend to be negative. 

8 A suppressor variable is one whose addition to a set of independent 
variables increases the beta weight associated with one or more of the 
variables in the set. 

9 It should be noted, however, that even these part correlations are not 
independent (Creager, 1970). For a fuller treatment of the problem of 
collinearity, see Creager and Boruch (1970) . 

1( *There may, of course, be interactions among input variables or among 
environmental variables} these types of interaction effects can be 
dealt with by rescoring the variables involved. 

11 These interpretive ambiguities inherent in "comparative" studies of 
this type have been discussed at length by Cronbach (1963). 

12 An interesting approach to defining any individual student's peer environ 
ment has been described by Rossi (1966) and employed by Wallace (1963). 
Briefly, this technique develops environmental measures from the aggre- 
gated characteristics only of those fellow students who are close friends 
or associates of the student in question. 
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